The Replace Operator

نویسنده

Lauri Karttunen

چکیده

This paper introduces to the calculus of regular expressions a replace operator and defines a set of replacement expressions that concisely encode alternate variations of the operation. Replace expressions denote regular relations, defined in terms of other regular expression operators. The basic case is unconditional obligatory replacement. We develop several versions of conditional replacement that allow the operation to be constrained by context 0. Introduction Linguistic descriptions in phonology, morphology, and syntax typically make use of an operation that replaces some symbol or sequence of symbols by another sequence or symbol. We consider here the replacement operation in the context of finite-state grammars. Our purpose in this paper is twofold. One is to define replacement in a very general way, explicitly allowing replacement to be constrained by input and output contexts, as in two-level rules (Koskenniemi 1983), but without the restriction of only single-symbol replacements. The second objective is to define replacement within a general calculus of regular expressions so that replacements can be conveniently combined with other kinds of operations, such as composition and union, to form complex expressions. Our replacement operators are close relatives of the rewrite-operator defined in Kaplan and Kay 1994, but they are not identical to it. We discuss their relationship in a section at the end of the paper. 0. 1. Simple regular expressions The replacement operators are defined by means of regular expressions. Some of the operators we use to define them are specific to Xerox implementations of the finite-state calculus, but equivalent formulations could easily be found in other notations. The table below describes the types of expressions and special symbols that are used to define the replacement operators. [1] (A) option (union of A with the empty string) ~A complement (negation) \A term complement (any symbol other than A) $A contains (all strings containing at least one A) A* Kleene star A+ Kleene plus A/B ignore (A interspersed with strings from B) A B concatenation A | B union A & B intersection A B relative complement (minus) A .x. B crossproduct (Cartesian product) A .o. B composition Square brackets, [], are used for grouping expressions. Thus [A] is equivalent to A while (A) is not. The order in the above table corresponds to the precedence of the operations. The prefix operators (~, \, and $) bind more tightly than the postfix operators (*, +, and /), which in turn rank above concatenation. Union, intersection, and relative complement are considered weaker than concatenation but stronger than crossproduct and composition. Operators sharing the same precedence are interpreted left-to-right. Our new replacement operator goes in a class between the Boolean operators and composition. Taking advantage of all these conventions, the fully bracketed expression [2] [[[~[a]]* [[b]/x]] | c] .x. d ; can be rewritten more concisely as [3] ~a* b/x | c .x. d ; Expressions that contain the crossproduct (.x.) or the composition (.o.) operator describe regular relations rather than regular languages. A regular relation is a mapping from one regular language to another one. Regular languages correspond to simple finite-state automata; regular relations are modeled by finite-state transducers. In the relation A .x. B, we call the first member, A , the upper language and the second member, B, the lower language. To make the notation less cumbersome, we systematically ignore the distinction between the language A and the identity relation that maps every string of A to itself. Correspondingly, a simple automaton may be thought of as representing a language or as a transducer for its identity relation. For the sake of convenience, we also equate a language consisting of a single string with the string itself. Thus the expression abc may denote, depending on the context, (i) the string abc, (ii) the language consisting of the string abc, and (iii) the identity relation on that language. We recognize two kinds of symbols: simple symbols (a, b, c, etc.) and fst pairs (a:b, y:z, etc.). An fst pair a:b can be thought of as the crossproduct of a and b, the minimal relation consisting of a (the upper symbol) and b (the lower symbol). Because we regard the identity relation on A as equivalent to A, we write a:a as just a. There are two special symbols [4] 0 epsilon (the empty string). ? any symbol in the known alphabet and its extensions. The escape character, %, allows letters that have a special meaning in the calculus to be used as ordinary symbols. Thus %& denotes a literal ampersand as opposed to &, the intersection operator; %0 is the ordinary zero symbol. The following simple expressions appear frequently in our formulas: [5] [] the empty string language. ~$[] the null set. ?* the universal ("sigma-star") language: all possible strings of any length including the empty string. 1. Unconditional replacement To the regular-expression language described above, we add the new replacement operator. The unconditional replacement of UPPER by LOWER is written [6] UPPER -> LOWER Here UPPER and LOWER are any regular expressions that describe simple regular languages. We define this replacement expression as [7] [ NO_UPPER [UPPER .x. LOWER] ]* NO_UPPER ; where NO_UPPER abbreviates ~$[UPPER []]. The definition describes a regular relation whose members contain any number (including zero) of iterations of [UPPER .x. LOWER], possibly alternating with strings not containing UPPER that are mapped to themselves. 1.1. Examples We illustrate the meaning of the replacement operator with a few simple examples. The regular expression [8] a b | c -> x ; (same as [[a b] | c] -> x) describes a relation consisting of an infinite set of pairs such as [9] a b a c a x a x a where all occurrences of ab and c are mapped to x interspersed with unchanging pairings. It also includes all possible pairs like [10] x a x a x a x a that do not contain either ab or c anywhere. Figure 1 shows the state diagram of a transducer that encodes this relation. The transducer consists of states and arcs that indicate a transition from state to state over a given pair of symbols. For convenience we represent identity pairs by a single symbol; for example, we write a:a as a. The symbol ? represents here the identity pairs of symbols that are not explicitly present in the network. In this case, ? stands for any identity pair other than a:a, b:b, c:c, and x:x. Transitions that differ only with respect to the label are collapsed into a single multiply labelled arc. The state labeled 0 is the start state. Final states are distinguished by a double circle.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Operator-valued tensors on manifolds

‎In this paper we try to extend geometric concepts in the context of operator valued tensors‎. ‎To this end‎, ‎we aim to replace the field of scalars $ mathbb{R} $ by self-adjoint elements of a commutative $ C^star $-algebra‎, ‎and reach an appropriate generalization of geometrical concepts on manifolds‎. ‎First‎, ‎we put forward the concept of operator-valued tensors and extend semi-Riemannian...

متن کامل

Implementation of Replace Rules Using Preference Operator

We explain the implementation of replace rules with the .r-glc. operator and preference relations. Our modular approach combines various preference constraints to form different replace rules. In addition to describing the method, we present illustrative examples.

متن کامل

THE REPLACE OPERATOR Lauri

This paper introduces to the calculus of regular expressions a replace operator and defines a set of replacement expressions that concisely encode several alternate variations of the operation. Replace expressions denote regular relations, defined in terms of other regular-expression operators. The basic case is unconditional obligatory replacement. We develop several versions of conditional re...

متن کامل

Microsoft Word - acl-95-10p.wÉ

متن کامل

Statistical Convergence Applied to Korovkin-type Approximation Theory

We present two general sequences of positive linear operators. The first is introduced by using a class of dependent random variables, and the second is a mixture between two linear operators of discrete type. Our goal is to study their statistical convergence to the approximated function. This type of convergence can replace classical results provided by Bohman-Korovkin theorem. A particular c...

متن کامل

Linear Resolvent Growth of a Weak Contraction Does Not Imply Its Similarity to a Normal Operator

It was shown in [2] that if T is a contraction in a Hilbert space with finite defect (‖T‖ ≤ 1, rank(I−T ∗T ) <∞), and its spectrum σ(T ) doesn’t coincide with the closed unit disk D, then the following Linear Resolvent Growth condition ‖(λI − T )−1‖ ≤ C dist(λ, σ(T )) , λ ∈ C\σ(T ), implies that T is similar to a normal operator. The condition rank(I − T ∗T ) < ∞ characterizes how close is T to...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1995

The Replace Operator

نویسنده

چکیده

منابع مشابه

Operator-valued tensors on manifolds

Implementation of Replace Rules Using Preference Operator

THE REPLACE OPERATOR Lauri

Microsoft Word - acl-95-10p.wÉ

Statistical Convergence Applied to Korovkin-type Approximation Theory

Linear Resolvent Growth of a Weak Contraction Does Not Imply Its Similarity to a Normal Operator

عنوان ژورنال:

اشتراک گذاری